Query-friendly Compression and Indexing of Recurring Structures in XML Documents
نویسندگان
چکیده
XML documents are by design self-describing. In order to accomplish this, the XML data is highly verbose and very repetitious. Although techniques already exist to compress XML and text in general, most do not keep the data in a form that is useful to users. We present a technique that makes use of recurring structures within an XML document to compress the file in a way that can achieve better compression than other query-friendly compression techniques while still maintaining the data in a form that allows for both querying and indexing. Further, we present an example implementation of the technique, complete with an index-building mechanism and query processing capabilities.
منابع مشابه
Prototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica
Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...
متن کاملA Generic Framework for Querying and Updating Secondary XML Index Structures
To cope with the increasing number and size of XML documents, XML databases provide index structures to accelerate queries on the content and structure of documents. To adapt indices to the query workload, XML databases require various secondary index structures. This paper presents a generic index framework called sciens (Structure and Content Indexing with Extensible, Nestable Structures). In...
متن کاملXIQS: An XML Indexing and Query System
Retrieval from XML data sets is an actively researched field that presents some different problems from retrieval of relational databases. The challenges stem from the characteristics of the tree structures of XML data. In this paper we present a system, XIQS, for XML query processing with an indexing strategy. Internal data structures are built based on the data type definitions (DTD) of the X...
متن کاملXSeq: An Index Infrastructure for Tree Pattern Queries
Given a tree-pattern query, most XML indexing approaches decompose it into multiple sub-queries, and then join their results to provide the answer to the original query. Join operations have been identified as the most time-consuming component in XML query processing. XSeq is a powerful XML indexing infrastructure which makes tree patterns a first class citizen in XML query processing. Unlike m...
متن کاملIndexing and Searching XML Documents Based on Content and Structure Synopses
We present a novel framework for indexing and searching schema-less XML documents based on concise summaries of their structural and textual content. Our search query language is XPath extended with full-text search. We introduce two novel data synopsis structures that correlate textual with positional information in an XML document and improves query precision. In addition, we present a two-ph...
متن کامل